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Abstract 

Distributed consensus and other linear systems with system stochastic matrices emerge in various 
settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in 
sensor networks. The matrices Wk are often random, due to, e.g., random packet dropouts in wireless 
sensor networks. Key in analyzing the performance of such systems is studying convergence of matrix 
products WkWk-i ■ ■ ■ W\. In this paper, we find the exact exponential rate / for the convergence in 
probabihty of the product of such matrices when time k grows large, imder the assumption that the W/j's 
are symmetric and independent identically distributed in time. Further, for connmonly used random models 
Uke with gossip and link failure, we show that the rate / is found by solving a min-cut problem and, 
hence, easily computable. Finally, we apply our results to optimally allocate the sensors' transmission 
power in consensus+innovations distributed detection. 
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I. Introduction 

Linear systems with stochastic system matrices Wk find applications in sensor [1], multi -robot [2], 
and social networks [3]. For example, in modeling opinion formation in social networks [3], individuals 
set their new opinion to the weighted average of their own opinion and the opinions of their neighbors. 
These systems appear both as autonomous, like consensus or gossip algorithms [4], and as input-driven 
algorithms, like consensus+iimovations distributed inference [5]. Frequently, the system matrices are 
random, like, for example, in consensus in wireless sensor networks, due to either the use of a randomized 
protocol like gossip [4], or to link failures-random packet dropouts. In this paper, we determine the exact 
convergence rate of products of random, independent identically distributed (i.i.d.) general symmetric 
stochastic^ matrices Wk, see Section HI. In particular, they apply to gossip and link failure. For example, 
with gossip on a graph G, each realization of Wk has the sparsity structure of the Laplacian matrix of a 
one link subgraph of G, with positive entries being arbitrary, but that we assume bounded away from zero. 

When studying the convergence of products WkWk-i---Wi, it is well known that, when the modulus 
of the second largest eigenvalue of E [Wk] is strictly less than 1, this product converges to J := ^11^ 
almost surely [6] and, thus, in probability, i.e., for any e > 0, 

r{\\Wk---Wi-J\\>e)^Owhenk^oo, (1) 

where || • || denotes the spectral norm. This probability converges exponentially fast to zero with k [7], 
but, so far as we know, the exact convergence rate has not yet been computed. In this work, we compute 
the exact exponential rate of decay of the probability in (1). 

Contributions. Assuming that the non-zero entries of Wk are bounded away from zero, we compute the 
exact exponential decay rate of the probabiUty in (1) by solving with equality (rather than lower and 
upper bounds) the corresponding large deviations Umit, for every e > 0: 

lim J logP (llW^fe • • • H^i - J|| > e) = -/, (2) 

where the convergence rate / > 0. Moreover, we characterize the rate / and show that it does not depend 
on e. Our results reveal that the exact rate / is solely a function of the graphs induced by the matrices Wk 
and the corresponding probabilities of occurrences of these graphs. In general, the computation of the 
rate / is a combinatorial problem. However, for special important cases, we can get particularly simple 

'By stochastic, we mean a nonnegative matrix whose rows sum to 1. Doubly stochastic matrices besides row have also column 
sums equal to 1. 
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expressions. For example, for a gossip on a connected tree, the rate is equal to | log(l —Pij)\, where pij 
is the probability of the link that is least likely to occur. Another example is with symmetric structures, 
like uniform gossiping and link failures over a regular graph for which we show that the rate / equals 
I logPisoi|> where pisoi is the probability that a node is isolated from the rest of the network. For gossip 
with more general graph structures, we show that the rate / = | log(l — c)| where c is the min-cut value 
(or connectivity [8]) of a graph whose links are weighted by the gossip link probabilities; the higher the 
connectivity c is (the more costly or, equivalently, less likely it is to disconnect the graph) the larger 
the rate / and the faster the convergence are. Similarly, with hnk failures on general graphs, the rate is 
computed by solving a min-cut problem and is computable in polynomial time. 

We now explain the intuition behind our result. To this end, consider the probabiUty in (1) when e = 1^, 
i.e., when the norm of HtLi — J stays equal to 1. This happens only if the supergraph of all the 
graphs associated with the matrix realizations Wi , . . . , Wk is disconnected. Motivated by this insight, we 
define the set of all possible graphs induced by the matrices Wk, i.e., the set of realizable graphs, and 
introduce the concept of disconnected collection of such graphs. For concreteness, we explain this here 
assuming gossip on a connected tree with M links. For gossip on a connected tree, the set of realizable 
graphs consists of all one-edge subgraphs of the tree (and thus is of size M). If any fixed j < M 
graphs were removed from this collection, the supergraph of the remaining graphs is disconnected; this 
collection of the remaining graphs is what we call a disconnected collection. Consider now the event 
that all the graph realizations (i.e., activated links) from time t = 1 to time t = k belong to a fixed 
disconnected collection, obtained, for example, by removal of one fixed one-edge graph. Because there 
were two isolated components in the network, the norm of Y[t=i ~ would under this event stay 
equal to 1. The probability of this event is M(l— p)^, where we assume that the links occur with the same 
probability p = jj- Similarly, if all the graph realizations belong to a disconnected collection obtained 
by removal of j one-edge graphs, for 1 < j < M, the norm remains at 1, but now with probabihty 
— jp)''. For any event indexed by j from this graph removal family of events, the norm stays at 
1 in the long run, but what will determine the rate is the most likely of all such events. In this case, the 
most likely event is that a single one-edge graph remains missing from time 1 to time k, the probabiUty 
of which is M{1 — p)^, yielding the value of the rate / = | log(l — p)\. This insight that the rate / is 
determined by the probabiUty of the most likely disconnected collection of graphs extends to the general 

^It turns out, as we will show in Section III, that the rate does not depend on e. Remark also that, because the matrices Wk 
are stochastic, the spectral norm of Wk ■ ■ ■ Wi — J is less or equal to 1 for all realizations of Wi,. . . , Wk - Thus, the probability 
in 1 is equal to for e > 1. 
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matrix process. 

Review of the literature. There has been a large amount of work on linear systems driven by stochastic 

matrices. Early work includes [9], [10], and the topic received renewed interest in the past decade [11], 
[12]. Reference [12] analyzes convergence of the consensus algorithm under deterministic time-varying 
matrices W^. Reference [4] provides a detailed study of the standard gossip model, that has been further 
modified, e.g., in [13], [14]; for a recent survey, see [15]. Reference [6] analyzes convergence under 
random matrices Wk, not necessarily symmetric, and ergodic - hence not necessarily independent in 
time. Reference [16] studies effects of delays, while reference [17] studies the impact of quantization. 
Reference [18] considers random matrices Wk and addresses the issue of the communication complexity 
of consensus algorithms. The recent reference [19] surveys consensus and averaging algorithms and 
provides tight bounds on the worst case averaging times for deterministic time varying networks. In 
contrast with consensus (averaging) algorithms, consensus+innovations algorithms include both a local 
averaging term (consensus) and an innovation term (measurement) in the state update process. These 
algorithms find applications in distributed inference in sensor networks, see, e.g., [5], [20], [21] for 
distributed estimation, and, e.g., [22], [23], [24], for distributed detection. In this paper, we illustrate the 
usefulness of the rate of consensus I in the context of a consensus+innovations algorithms by optimally 
allocating the transmission power of sensors for distributed detection. 

Products of random matrices appear also in many other fields that use techniques drawn from Markov 
process theory. Examples include repeated interaction dynamics in quantum systems [25], inhomogeneous 
Markov chains with random transition matrices [26], [25], infinite horizon control strategies for Markov 
chains and non- autonomous linear differential equations [27], or discrete linear inclusions [28]. These 
papers are usually concerned with deriving convergence results on these products and determining the 
limiting matrix. Reference [25] studies the product of matrices belonging to a class of complex contraction 
matrices and characterizes the hmiting matrix by expressing the product as a sum of a decaying process, 
which exponentially converges to zero, and a fluctuating process. Reference [27] establishes conditions 
for strong and weak ergodicity for both forward and backward products of stochastic matrices, in terms 
of the limiting points of the matrix sequence. Using the concept of infinite flow graph, which the authors 
introduced in previous work, reference [26] characterizes the limiting matrix for the product of stochastic 
matrices in terms of the topology of the infinite flow graph. For more structured matrices, [29] studies 
products of nonnegative matrices. For nonnegative matrices, a comprehensive study of the asymptotic 
behavior of the products can be found in [30]. A different line of research, closer to our work, is 
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concerned with the limiting distributions of the products (in the sense of the central limit theorem and 
large deviations). The classes of matrices studied are: invertible matrices [31], [32] and its subclass of 
matrices of determinant equal to 1 [33] and, also, positive matrices [34]. None of these apply to our 
case, as the matrices that we consider might not be invertible (Wk — J has a zero eigenvalue, for every 
realization of Wk) and, also, we allow the entries of Wk to be zero, and therefore the entries of Wk — J 
might be negative with positive probability. Furthermore, as pointed out in [35], the results obtained 
in [31], [32], [33] do not provide ways to effectively compute the rates of convergence. Reference [35] 
improves on the existing literature in that sense by deriving more explicit bounds on the convergence 
rates, while showing that, under certain assumptions on the matrices, the convergence rates do not depend 
on the size of the matrices; the result is relevant from the perspective of large scale dynamical systems, 
as it shows that, in some sense, more complex systems are not slower than systems of smaller scale, but 
again it does not apply to our study. 

To our best knowledge, the exact large deviations rate / in (2) has not been computed for i.i.d. 
averaging matrices Wk, nor for the commonly used sub-classes of gossip and link failure models. Results 
in the existing literature provide upper and lower bounds on the rate /, but not the exact rate /. These 
bounds are based on the second largest eigenvalue of E[VFjt] or E[W|], e.g., [4], [36], [6]. Our result (2) 
refines these existing bounds, and sheds more light on the asymptotic convergence of the probabihties 
in (1). For example, for the case when each reahzation of Wk has a coimected underlying support 
graph (the case studied in [12]), we calculate the rate I to be equal +00 (see Section HI), i.e., the 
convergence of the probabihties in (1) is faster than exponential. On the other hand, the "rate" that 
would result from the bound based on A2(IE[VF|]) is finite unless Wk = J. This is particularly relevant 
with consensus+innovations algorithms, where, e.g., the consensus+innovations distributed detector is 
asymptotically optimal if / = cx), [37]; this fact cannot be seen from the bounds based on A2(]E[VF|]). 

The rate / is a valuable metric for the design of algorithms (or linear systems) driven by system matrices 
Wk, as it determines the algorithm's asymptotic performance and is easily computable for commonly used 
models. We demonstrate the usefulness of / by optimizing the allocation of the sensors' transmission 
power in a sensor network with fading (failing) hnks, for the purpose of distributed detection with the 
consensus+iimovations algorithm [23], [24]. 

Paper organization. Section 11 introduces the model for random matrices Wk and defines relevant 

quantities needed in the sequel. Section 111 proves the result on the exact exponential rate / of consensus. 
Section IV shows how to compute the rate / for gossip and link failure models via a min-cut problem. 
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Section V addresses optimal power allocation for distributed detection by maximizing the rate /. Finally, 
section VI concludes the paper. 



Model for the random matrices Wk. Let {Wk ■ k = 1,2, ...} be a discrete time (random) process where 
Wk, for all k, takes values in the set of doubly stochastic, symmetric, N x N matrices. 

Assumption 1 We assume the following. 

1) The random matrices are independent identically distributed (i.i.d.). 

2) The entries of any realization W of are bounded away from whenever positive. That is, there 
exists a scalar 5, such that, for any reahzation W, if Wij > 0, then Wij > S. An entry of Wk with 
positive value, will be called an active entry. 

3) For any reahzation W, for all i, Wu > S. 

Also, let W denote the set of all possible reahzations of Wk- 

Graph process. For a doubly stochastic symmetric matrix W, let G{W) denote its induced undirected 
graph, i.e., G{W) = {V, E{W)), where V = {1,2, . . . , N} is the set of all nodes and 



We define the random graph process {Gt : t = 1,2, .. .} through the random matrix process {W^ ■ k = 
1, 2, ...} by: Gt = G{Wt), for t = 1, 2, .... As the matrix process is i.i.d., the graph process is i.i.d. as 
well. We collect the underlying graphs of all possible matrix reahzations W (in W) in the set Q: 



Thus, the random graphs Gt take their realizations from Q. Similarly, as with the matrix entries, if 

{i,j} G E{Gt), we call an active link. 

We remark that the conditions on the random matrix process from Assumption 1 are satisfied auto- 
matically for any i.i.d. model with finite space of matrices W {5 could be taken to be the minimum 
over all positive entries over all matrices from W). We illustrate with three instances of the random 
matrix model the case when the (positive) entries of matrix realizations can continuously vary in certain 
intervals, namely, gossip, d-adjacent edges at a time, and hnk failures. 

Example 1 (Gossip model) Let G = {V,E) be an arbitrary connected graph on N vertices. With the 
gossip algorithm on the graph G, every realization of Wk has exactly two off diagonal entries that are 



II. Problem setup 




: Wii > 



g := {G{W) :W eW}. 



(3) 
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active: = [W^fejji > 0, for some G G, where the entries are equal due to the symmetry 

of Wh. Because is stochastic, we have that [Wklu = [Wk]jj = 1 — which, together with 

Assumption 1, implies that [VF^Jij must be bounded (almost surely) hy S < [Wk]ij < 1 — 6. Therefore, 
the set of matrix realizations in the gossip model is: 

Gossip^ U {AeR^""^ :Aij = Aji = a, Aii = Ajj = l-a, ae[6,l-S], 

All = 1, for/ 7^ Ami = 0,forZ ^ mwidl,m^ i,j} . 

Example 2 (Averaging model with d-adjacent edges at a time) Let = (V, E) be a d-regular con- 
nected graph on N vertices, d < N — 1. Consider the following averaging scheme where exactly 2d 
off-diagonal entries of are active at a time: [VFfcJij = [VFfcJjj > 0, for some fixed i G F and all 
j G y such that {^, j} G In other words, at each time in this scheme, the set of active edges is the 
set of edges adjacent to some node i ^V. Taking into account Assumption 1 on Wk, the set of matrix 
realizations for this averaging model is: 

y^d-adjacent ^ J |^ ^ ^NxN A = A~^ , A^ = V ,V & M.^ ,Vj = 0,if {z, j} ^ E, V = l,V>5, 

iev 

Ajj = 1 - A^j,hi{i,j} G E,Aii = land Ail = 0, for{i,l} ^ E^ , 
where Ai denotes the ith column of matrix A. 

Example 3 (Link failure (Bernoulli) model) Let G = (F, be an arbitrary connected graph on N 
vertices. With hnk failures, occurrence of each edge in is a Bemoulh random variable and occurrences 
of edges are independent. Due to independence, each subgraph H = {V, F) of G, F C E, is a reahzable 
graph in this model. Also, for any given subgraph H of G, any matrix W with the sparsity pattern of 
the Laplacian matrix of H and satisfying Assumption 1 is a realizable matrix. Therefore, the set of all 
realizable matrices in the link failure model is 

W L'-^f^'- = U ^ ^'^'''^ • ^ = ^^'^y ^ ^^^HiJ} e F, Aj = 0,if ^ F, ^1 = l} . 

FCE 

Supergraph of a collection of graphs and supergraph disconnected collections. For a collection of 
graphs v. on the same set of vertices V, let T{'H) denote the graph that contains all edges from all graphs 
in Ti. That is, r(^) is the minimal graph (i.e., the graph with the minimal number of edges) that is a 
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supergraph of every graph in %: 

:= {V, U E{G)), (4) 

where E{G) denotes the set of edges of graph G. 

Specifically, we denote by r(s,t)^ the random graph that collects the edges from all the graphs Or 
that appeared from time r = t + l io r = s, s>t, i.e., 

T{s,t) :=r({G„G,_i,...,Gt+i}). 

Also, for a collection H C ^ we use to denote the probability that a graph realization Gt belongs 
to U: 

We next define collections of realizable graphs of certain types that will be important in computing 
the rate in (2). 

Definition 4 The collection H C Q is a. disconnected collection of Q if its supergraph T{'H) is discon- 
nected. 

Thus, a disconnected collection is any collection of realizable graphs such that the union of all of its 
graphs yields a disconnected graph. We also define the set of all possible disconnected collections of Q: 

n(^) = {T-L C ^ : H is a disconnected collection on Q} . (6) 

We further refine this set to find the largest possible disconnected collections on Q. 

Definition 5 We say that a collection H C Q is a. maximal disconnected collection of Q (or, shortly, 
maximal) if: 

i) H E n(^), i.e., H is a disconnected collection on Q; and 

ii) for every G G \ r(H U G) is connected. 

In words, H is maximal if the graph T{'H) that collects all edges of all graphs in H is disconnected, but, 
adding all the edges of any of the remaining graphs (that are not in H) yields a connected graph. We 

^Graph r(s, t) is associated with the matrix product Ws ■ ■ ■ Wt+i going from time t+1 until time s > t. The notation r(s, t) 
indicates that the product is backwards; see also the definition of the product matrix t) in Section III. 
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also define the set of all possible maximal collections of Q: 

n*(^) = {Ti C ^ : is a maximal collection on ^} . (7) 

We remark that n*(^) C Il{Q). We now illustrate the set of all possible graph realizations Q, and its 
maximal collections H with two examples. 

Example 6 (Gossip model) If the random matrix process is defined by the gossip algorithm on the full 
graph on N vertices, then Q = : {i, j} G (2^)|; words, Q is the set of all possible one-link 

graphs on vertices. An example of a maximal collection of Q is 

g\{{V,{i,j}):j = l,...N,j^i}, 

where i is a fixed vertex, or, in words, the collection of all one-link graphs except of those whose link 
is adjacent to i. Another example is 

g\i{{V,{i,k}):k = l,...N,k^i,k^j}U{{V,{j,l}):l = l,...N,l^i,l^j}). 

Example 7 (Toy example) Consider a network of five nodes with the set of realizable graphs Q = 
{Gi,G2,Gz], where the graphs Gj, i = 1,2,3 are given in Figure 1. In this model, each realizable 
graph is a two-link graph, and the supergraph of all the realizable graphs r({Gi, G2, G3}) is connected. 




Fig. 1. Example of a five node network with three possible graph realizations, each being a two-link graph 

If we scan over the supergraphs T{T-L) of all subsets T-L of Q, we see that r({Gi, G2]), r({G2, Gs}) and 
r({Gi, G2, G3}) are connected, whereas the r({Gi, G3}), and T{Gi) = Gi, i = 1, 2, 3 are disconnected. 
Therefore, Ii{g) = {{Gi}, {G2}, {G3}, {Gi, G3}} and Yi^{g) = {{G2}, {Gi, G3}}. 

We now observe that, if the graph T{s,t) that collects all the edges that appeared from time t + 1 to 
time s is disconnected, then all the graphs Gr that appeared from r = t + 1 through r = s belong to 
some maximal collection T-L. 
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Observation 8 If for some s and t, s > t, T{s,t) is disconnected, then there exists a maximal collection 
n G n*(^), such that Gr € H, for every r,t<r<s. 

III. Exponential rate for consensus 

Denote $(s, t) := WsWs-i ■ ■ ■ Wt+i, and t) := ^>(s, t) - J, for s > i > 0. The following Theorem 
gives the exponential decay rate of the probability P ^ $(/>;, 0) > ej. 

Theorem 9 Consider the random process {Wk : A; = 1, 2, . . .} under Assumption 1. Then: 



lim I logP ( ^k, 0) > e) = -/, Ve G (0, 1] 
fe-*oo k \ J 



where 



and 



/= < 



+00 ifn*(g) = 

I logj3niax| Otherwise 



Pmax = max pu 
is the probability of the most hkely maximal disconnected collection. 

To prove Theorem 9, we first consider the case when n*(^) is nonempty, and thus when Pmax > 0- In 
this case, we find the rate I by showing the lower and the upper bounds: 



$(A;,0) 


>-^) 


> log Pmax 


(8) 


$(A;,0) 


>-^) 


< log Pmax- 


(9) 



k—^oo 

Subsection IH-A proves the lower bound (8), and subsection IH-B proves the upper bound (9). 

A. Proof of the lower bound (8) 

We first find the rate for the probability that the network stays disconnected over the interval 1, A;. 
Lemma 10 

lim \- logP (r(A;, 0) is disconnected ) = logpmax- 
Having Lemma 10, the lower bound (8) follows from the following relation: 



$(A;,0) > > P $(A;,0) 



= P (r(/s, 0) is disconnected \ 



March 1, 2012 



DRAFT 



11 



that is stated and proven in Lemma 13 further ahead. 

Proof of Lemma 10: If all the graph realizations until time k belong to a certain maximal collection 
H, by definition of a maximal collection, T{k, 0) is disconnected with probabiUty 1. Therefore, for any 
maximal collection 71, the following bound holds: 

P (r(A;, 0) is disconnected) > P (Gt G = 1, . . . , A;) = p^. 

The best bound, over all maximal collections T-L, is the one that corresponds to the "most likely" maximal 
collection: 

P {V{k, 0) is disconnected ) > p^^x- (10) 

We will next show that an upper bound with the same rate of decay (equal to Pmax) holds for 
the probabihty of the network staying disconnected. To show this, we reason as follows: if V{k, 0) is 
disconnected, then all the graph realizations until time A;, Gi, . . . , G^, belong to some maximal collection. 
It follows that 

P (r(A;, 0) is disconnected ) = P |J {Gt e for t = 1, . . . , A;} 

\wen*(0) 

< F{GteH, ioTt = l,...,k) 

= E P'n- 

•Hen* (5) 

Finally, we bound each term in the previous sum by the probabihty Pmax of the most hkely maximal 
collection, and we obtain: 

P {T{k, 0) is disconnected) < {U* {Q)\ p^^^, (11) 

where |n*(t/)| is the number of maximal collections on G- 
Combining (10) and (11) we get: 



Pmax < P(r(A;,0) is disconnected) < |n*(g)|p; 
which implies 

lim Y logP (r(A;, 0) is disconnected) = logpi, 

fc->oo k 



k 

max' 
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B. Proof for the upper bound in (9) 

The next lemma relates the products of the weight matrices <^>(s, t) with the corresponding graph r(s, t) 
and is the key point in our analysis. Recall that t) := $(s, t) — J. 

Lemma 11 For any realization of matrices Wr G W, r = t + 1, . . . ,s, s > t:^ 

1) if Ms,t)l,j > 0, i / j, then Ms,t)]ij > 5'-'; 

2) for i = l,...,N Ms,t)]ii>5'-'; 

3) if ms,t)]ij > 0, i / j, then > 6^^'-'^; 

4) ms,t)\\<{l-6^(^-')XF{L{T{s,t))))' , 

where L{G) is the Laplacian matrix of the graph G, and Af(^) is the second smallest eigenvalue (the 
Fiedler eigenvalue) of a positive semidefinite matrix A. 

Proof: Parts 1 and 2 are a consequence of the fact that the positive entries of the weight matrices 
are bounded below by S by Assumption 1; for the proofs of 1 and 2, see [38], Lemma 1 a), b). Part 3 
follows from parts 1 and 2, by noticing that, for all i 7^ j, such that [$(s, t)]ij > 0, we have: 

N 



1=1 



> [^{s,t)]uMs,t)] 



To show part 4, we notice first that 
thus, can be computed as 



$(s, t) is the second largest eigenvalue of i)^$(s, t), and. 



max q^^{s,ty^{s,t)q 



Since $(s, t)^$(s, is a symmetric stochastic matrix, it can be shown, e.g., [12], that its quadratic 
form can be written as: 

q'^^s,t)'^^s,t)q = q'^q-y2\^s,ty^s,t)] {qi - Qjf 

< i_^2(.-t) {qi-qjf. (12) 



''The statements of the results in subsequent Corollary 12 and Lemma 13 are also in the point- wise sense. 
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where the last inequahty follows from part 3. Further, if the graph r(s, t) contains some link {i, j}, then, 
at some time r, t < r < s, a realization Wr with [Wr]ij > occurs. Since the diagonal entries of all the 
realizations of the weight matrices are positive (and, in particular, those from time r to time s), the fact 
that {Wr]ij > implies that [$(s,t)]ij > 0. This, in turn, implies 



{{i,j} e r{s,t)} c <{ {i,j} : ^{s,ty^s,t) 



> 



Using the latter, and the fact that the entries of $(s, t)^$(s, t) are non-negative, we can bound the sum 
in (12) over : [^{s,t)~^^{s,t)].^ > o| by the sum over {{i,j} G r(s, i)} only, yielding 

{i,j}er{s,t) 

Finally, mingT^=i Yl{i j}er{s t) i^i ~ equal to the Fiedler eigenvalue (i.e., the second smallest 

eigenvalue) of the Laplacian L{r{s, t)). This completes the proof of part 4 and Lemma 11. ■ 
We have the following corollary of part 4 of Lemma 11, which, for a fixed interval length s — t, and for 
the case when V{s,t) is connected, gives a uniform bound for the spectral norm of $(s,t). 

Corollary 12 For any s and t, s > t, if T{s,t) is connected, then 



(13) 



where c = 2(1 — cos-^) is the Fiedler value of the path graph on vertices, i.e., the minimum of 
Xf{L{G)) > over all connected graphs on N vertices [39] . 

Proof: The claim follows from part 4 of Lemma 11 and from the fact that for connected r{s,t): 

C = ming! is connected 

The previous result, as well as part 4 of Lemma 11, imply that, if the graph r(s, t) is connected, then 
the spectral norm of is smaller than 1. It turns out that the connectedness of T(s,t) is not only 



sufficient, but it is also a necessary condition for 



Lemma 13 For any s and t, s > t: 



< 1. Lemma 13 explains this. 



<1 <^ r(s, is connected. 

Proof: We first show the if part. Suppose r{s,t) is connected. Then, Ap {L {r{s,t))) > and the 
claim follows by part 4 of Lemma 11. We prove the only if part by proving the following equivalent 
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statement: 



T{s,t) is not connected ^ ^{s,t) 



1. 



To this end, suppose that r(s, t) is not connected and, without loss of generality, suppose that r(s, t) has 
two components Ci and C2. Then, for i e Ci and j G C2, ^ r(s, t), and, consequently, ^ Gr> 
for all r, t < r < s. By definition of Gr, this implies that the i,j-ih entry in the corresponding weight 
matrix is equal to zero, i.e., 

Vr, i < r < s : [Wr]ij = Oy{i,j} s.t. i G Ci, j G C2. 

Thus, every matrix realization Wr from time r = t + I to time r = s has a block diagonal form (up to 
a symmetric permutation of rows and columns) 



Wr = 



[Wr]c\ 

where [Wr]ci is the block of Wr corresponding to the nodes in Ci, and similarly for [Wr]c2- This 



impUes that $(s,t) will have the same block diagonal form, which, in turn, proves that ^{s,t) 
This completes the proof of the only if part and the proof of Lemma 13. 
We next define the sequence of stopping times Ti, i = 1,2, ... hy: 

Ti = mm{t > Ti^i + 1 : r{t,Ti^i) is connected}, for i > 1, 

ro = o. 



= 1. 



(14) 



The sequence {Ti}i>i defines the times when the network becomes connected, and, equivalently, when 
the averaging process makes an improvement (i.e., when the spectral radius of $ drops below 1). 
For fixed time k > 1, let denote the number of improvements until time k: 



Mk = max{i >0:Ti<k}. 



(15) 



We now explain how, at any given time k, we can use the knowledge of to bound the norm of the 
"error" matrix $(fc, 0). Suppose that = m. If we knew the locations of all the improvements until 
time k, Ti = ti, i = 1, . . . ,m then, using eq. (13), we could bound the norm of ^{k, 0). Intuitively, since 
for fixed k and fixed m the number of allocations of Tj's is finite, there will exist the one which yields 



the worst bound on 



$(A;,0) 



It turns out that the worst case allocation is the one with equidistant 



improvements, thus allowing for deriving a bound on 



<^ik,0) 



only in terms of Mj.. This result is given 
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in Lemma 14. 



Lemma 14 For any realization of W\, W2, ■ ■ ■ , Wk and k = 1,2,... the following holds: 



(16) 



Proof: Suppose = m and Ti = ti, T2 = t2, 



T 



tm ^ k (Ti > k, for i > m, because 



Mfe = m). Then, by Corollary 12, for i = 1, . . . ,m, we have 



Combining this with submultiplicativity of the spectral norm, we get: 



$(A;,0) 



^{k,tm)^itm,t 



m 

s n(i- 



i=l 



m-1) ■ ■ 



$(tl,0) 

$(ii,0) 



(17) 



To show (16), we find the worst case of tj's, i 



max W{l-c5^^^) 



L, . . . , m by solving the following problem: 

m 

max Wil-cd^l'A 
{E:ixft<iAe{i,f,...,i}}fi^ ^ 



< max If ( 1 

{E:ii/3.<iA>o}f^-^ V 



(18) 



(here Aj should be thought of as ti — ti^i). Taking the log of the cost function, we get a convex problem 
equivalent to the original one (it can be shown that the cost function is concave). The maximum is 
achieved for = ^, i = 1, . . . , m. This completes the proof of Lemma 14. ■ 
Lemma 14 provides a bound on the norm of the "error" matrix $(A;, 0) in terms of the number of 
improvements up to time k. Intuitively, if is high enough relative to A;, then the norm of ^{k, 0) 
cannot stay above e as A; increases (to see this, just take = A; in eq. (16)). We show that this is indeed 
true for all random sequences Gi, G2, . . . for which Mk = ak or higher, for any choice of a G (0, 1]; 
this result is stated in Lemma 15, part 1. On the other hand, if the number of improvements is less than 
ak, then there are at least k — ak available slots in the graph sequence in which the graphs from the 
maximal collection can appear. This yields, in a crude approximation, the probability of Pmlix^ for the 
event < ak; part 1 of Lemma 15 gives the exact bound on this probabihty in terms of a. We next 
state Lemma 15. 

Lemma 15 Consider the sequence of events {Mj. > ak}, where a G (0,1], k = 1,2, For every 

a,ee (0, 1]: 
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1) There exists sufficiently large ko = ko{a,e) such that 

P ^{k, 0) >e,Mk> ak^ =0, Vfc > ko{a, e) 

2) 



(19) 



lim sup — 

k-^oo k \ 



(21) 



>e, Mk<akj < -aloga + alog |n*(g)| + (1 - q) log^Pmax- (20) 
Proof: To prove 1, we first note that, by Lemma 14 we have: 

{||$(/c,0) > e} C Hi - c5^4:) ' >el. 

This gives for fixed a, e: 

P ( $(A;,0) >e, Mk> ak^ < P ^(l - c(5^^) ' >e, Mk>ak 

= P((l-C(5^^) ' >e,Mk = mj 

/ lo e \ 

= J2 P(9{k,m)>^,Mk = m\, 



(22) 



where g{k, m) := ^ log ^1 — C(5^m j , for m > 0, and \x\ denotes the smallest integer not less than x. 
For fixed k, each of the probabilities in the sum above is equal to for those m such that g{k, m) < 
This yields: 



loge 
k ' 



Mk = m 



m= [afe] 

where s{k, m) is the switch fimction defined by: 



m 



(23) 



s(fc, m) := 



0, iig{k,m)<'^ 



1, otherwise 

Also, as g{k, ■) is, for fixed k, decreasing in m, it follows that s(fc, m) < s{k, ak) for m > ak. Combining 
this with eqs. (22) and (23), we get: 

P $(fc, 0) > e, Mjfc > ak^ < {k - \ak] + l)s(A;, ak). 
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We now show that s{k,ak) will eventually become 0, as k increases, which would yield part 1 of 
Lemma 15. To show this, we observe that g has a constant negative value at {k, ak): 

(X / 2 \ 

g{k,ak) = - log [1 - cd^ j . 

Since ^ log e ^ 0, as A; ^ oo, there exists ko = ko{a, e) such that g{k, ak) < ^ log e, for every k > ko- 
Thus, s{k, ak) = for every k > ko- This completes the proof of part 1. 
To prove part 2, we observe that 

[afe]-l 

> e, Mfe < ak^ < P {Mk < ak) = ^ f {Mk = m) . (24) 

m=0 

Recalling the definition of Mk, we have {Mk = m} = {T^ < k, T^+i > k}, for m > 0; this, by further 
considering all possible reahzations of Ti, i <m, yields 

¥{Mk = m)= F{Ti = ti,foil<i<m,Tm+i>k), (25) 

i<ti<...<t„<fc 

where the summation is over all possible realizations Ti = ti, i = I, . . . ,m. Next, we remark that, by 
definition of stopping times Tj, supergraph r(Tj — l,Tj_i) is disconnected with probability 1, for i < m 
{Ti is defined as the first time t after time Tj-i when the supergraph r(t,Tj_i) becomes connected); 
similarly, if T^+i > k, then T{k, T^) is disconnected. Fixing the realizations Tj = U, i < m, this implies 

P {Ti = ti, fori < m, T^+i > k) <F {T{ti — 1, ij-i) is disconnected, fori < m + 1) 

m+l 

= Jl P(r(ii - 1, ti-i) is disconnected) (26) 

where tm+i := A; + 1 and the equality follows by the independence of the graph realizations. Recalling 
Observation 8 and the definition of Pmax, we further have, for i < m + 1, 

F{r{ti - l,ti_i) is disconnected) = P( (J {Gt e H,ti-i < t < U}) < \n*{g)\p^^-;^'-'-^ 

which, combined with (26), yields: 

P {Ti = ti, fori < m, T^+i > k) < \U*{g)r+'p^^^. (27) 

The bound in (27) holds for any realization Ti = ti, I < ti <...< tm < k, of the first m stopping 
times. Since the number of these realizations is (^) < (^)™ (see eq. (25)), we obtain the following 



( Hk,0) 
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bound for the probability of the event = m, where m < \ak] — 1: 

'fee' 



'(Mfe = m) < 



m 



, (^)l Pmax • 



(28) 



Finally, as function h{m) := (^)™ |n*(^)|'"+^j9f^jj™, that upper bounds the probabihty of the event 
Mfe = m, is increasing for m < k, combining (24) and (28), we get 



m=0 



^lak] k-(\ak-]-l) 



(29) 



Taking the log and dividing by k, and taking the lim sup;i._^j^ yields part 2 of Lemma 15: 

limsup^logPf $(A;,0) > e, < afc) < alog - + alog |n*(6;)| + (1 - a) logp^ax- (30) 
fc_4.oo k \ /a 



To complete the proof of the upper bound (9), it remains to observe the following: 



(||^(fc,o) 



> e 



$(fc,0) 



> e, Mfc < aA; + 



$(/c,0) 



> e, 



Mk > ak^ 



^{k,0) >e,Mk< ak^ , loi k > ko{a,€), 



where the last equality follows by part 1 of Lemma 15. Thus, 



limsup ^ logP r $(A;,0) 



> e ) = lim sup - log I 



$(A:,0) 



> e, Mfc < 



ak^ 



< -aloga + Q;log|n*(^)| + (1 - Q:)logp„ 



(31) 



Since, by part 2 of Lemma 15, inequality (31) holds for every a G (0,1], taking the infQg(o,i] yields 
the upper bound (9). This completes the proof of Theorem 9 for the case when n*(^) is nonempty. We 
now consider the case when n*(^) = 0. In this case each realization of Gt is connected (otherwise, 
n(^) would contain at least this disconnected realization). Applying Corollary 12 to successive graph 
realizations (i.e., for s = t + 1) we get that 

$(/c,0) < {l-c5'^Y ■ 

For any e > 0, there will exists ki = ki{e) such that the right hand side is smaller than e for all k> k\. 



This implies that, for any k > ki, the norm 



^>(A;,0) 



is smaller than e with probability 1, thus yielding 



the rate / = oo in Theorem 9 for the case when n*(^) = 0. This completes the proof of Theorem 9. 
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IV. Computation of the exponential rate of consensus via min-cut: Gossip and link 

FAILURE MODELS 

Motivated by tlie applications of averaging in sensor networks and distributed dynamical systems, we 
consider two frequently used types of random averaging models: gossip and link failure models. For a 
generic graph G = (V, E), we show that Pmax for both models can be found by solving an instance 
of a min-cut problem over the same graph G. The corresponding hnk costs are simple functions of the 
link occurrence probabihties. In this section, we detail the relation between the min-cut problem and 
computation of Pmax- 

We now state Lemma 16 on the computation of pmax that holds for the general random graph process 
that later will help us to calculate Pmax for the gossip and link failure models. Lemma 16 assures that Pmax 
can be found by relaxing the search space from n*(^) - the set of maximally disconnected collections, 
to n(^) - the set of all disconnected collections. 
Lemma 16 

Pmax = max py, (32) 

Proof: Since n*(^) C n(^), to show (32) it suffices to show that for any H G n(^) there exists 
H' G n*(^) such that py,' > py. To this end, pick arbitrary % G n(^) and recall Observation 8. Then, 
there exists %' G n*(^) such % C %' , which implies that 

PH = Y.¥{Gt = G)< ^{Gt = G)=pH'. 
GeH GeH' 

and proves (32). ■ 
Before calculating the rate / for gossip and hnk failure models, we explain the minimum cut (min-cut) 

problem. 

Minimum cut (min-cut) problem. Given an undirected weighted graph G = {V, E, C) where V is the 
set of N nodes, E is the set of edges, and C = [cij] is the N x N matrix of the edge nonnegative costs; 
by convention, we set cu = 0, for all i, and Cij = 0, for ^ E. The min-cut problem is to find the 

sub-set of edges E' such that G' = {V,E\ E') is disconnected and the sum j^^e' minimal 
possible; we denote this minimal value, also referred to as the connectivity, by mincut(y, E, C). The 
min-cut problem is easy to solve, and there exist efficient algorithms to solve it, e.g., [40], [8]. 

A. Gossip model 

Consider the network of N nodes, collected in the set V and with the set E C defining 
communication links between the nodes, such that if {^, j} G E then nodes i,j G V can communicate. In 
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the gossip algorithm, only one link {i, j} E E is active at a time. Let pij be the probabiUty of occurrence 
of Unk {i,j} e E: 

Pi,=P(Gt = (y,{z,j})). (33) 

We note that Y.{i,j}eEPij = 1- 

Lemma 17 Consider a gossip model on a graph G = {V,E) with link probabilities pij, G E. 

Construct a mincut problem instance with the graph G and the cost assigned to link equal pij. 
Then: 

E,P) = l- mincut(^, E, P) (34) 

jGossip^y^ ^, P) = - log(l - mincut(F, E, P)), (35) 

where P is the symmetric matrix that collects link occurrence probabilities, Pij = pij, G E, 

Pii = 0,fori = l,...,N and P^j = 0, {i,j} ^ E. 

Proof: For the gossip model, the set of all possible graph reaUzations ^'^"^^'ip is the set of all one-link 
subgraphs of (V, E): 

g^<>^^^P = {{V,{i,j}):{i,j}GE}. (36) 

Also, there is a one to one correspondence between the set of collections of realizable graphs and the set of 
subgraphs of G: a collection H corresponds to the subgraph of G if and only if H = r(H). Thus, 
if we assign to each Unk in G a cost equal to pij, then searching over the set n(^) of all disconnected 
collections to find the most likely one is equivalent to searching over all disconnected subgraphs of G 
with the maximal total cost: 

Pm^'''= max pn 

= max } Pij. (37) 

E'CE, (y.B') is disc 4^ 

Using the fact that Yl{i j}eEPij ~ ^' ^1- (3^) can be written as: 

max } Pij = max 1 — } pij (38) 

E'CE, (V,E') is disc. ^ FC E , (V,E\F) is disc. 

{*J}eE' {«J}6F 

= 1 — min > Pij. (39) 

FCB, (y,B\F) is disc 4^ 

The minimization problem in the last equation is the min-cut problem mincut(y, P). ■ 
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Gossip on a regular network. We now consider a special case of the uniform gossip model on a 
connected regular graph with degree d, d = 2,...,N — 1, and the uniform hnk occurrence probabihty 
p := Pij = It can be easily seen that the value of the min-cut is p times the minimal number of 
edges that disconnects the graph, which equals pd = 2/N; this corresponds to cutting all the edges of a 
fixed node, i.e., isolating a fixed node. Hence, 

Pmax = IP (node i is isolated) = 1 — 2/N 
/ = -log(l-2/iV). 

Note that the asymptotic rate / is determined by the probability that a fixed node is isolated; and the 
rate I does not depend on the degree d. 

B. Link failure model 

Similarly as with the gossip model, we introduce a graph G = {V, E) to model the communication 
hnks between the nodes. In contrast with the gossip model, the hnk failure model assumes that each 
feasible link {i, j} G E occurs independently from all the others links in the network. Let again pij denote 
the probabihty of occurrence of hnk G E. (Remark that, due to the independence assumption, we 
now do not have any condition on the hnk occurrence probabilities pij.) 

Lemma 18 Consider a link failure model on a graph G = {V, E) with hnk probabilities pij, {i, j} G E. 
Construct a mincut problem instance with the graph G and the cost of hnk {z, j} equal to — log(l — pjj). 
Then: 

Pm^*^^^- E, P) = e-'«i°™t(^'^'- log(i--P)) (40) 
jLinkfaii.(y^ P) = mincut(y, E, - log(l - P)), (41) 

where P is the symmetric matrix that collects the link occurrence probabilities, = pij, G E, 

Pa = 0, for i = 1, N and Pij = 0, ^ E and logX denotes the entry wise logarithm of a 

matrix X. 

Proof: Since the links occur independently, any subgraph H = {V, E') of G can occur at a given 
time, therefore yielding that the collection of reahzable graphs ^^mk fmi. ^j^g collection of all subgraphs 
of G: 

gLmk fail. = [(y,E') : E' e2^}; (42) 
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here 2^ denotes the power set of E, i.e., the collection of all possible subsets of the set of feasible links E. 

This implies that for any fixed set F C E of edges that disconnect G = {V,E) we can find a 
disconnected collection V. C Q such that r('^) = {V,E\ F) (recall that r('^) is the minimal supergraph 
of all the graphs contained in "H). On the other hand, any disconnected collection will map by F to one 
disconnected subgraph of G. Therefore, in order to find Pm^l^''"'' we can split the search over disconnected 
collections H as follows: 

Pj^i^a^'''' = max pn 

HCg T{H)is disc. 

= max max pu- (43) 

FCE, F disconnects (V,E) HCg r{n)={V,E\F) 

Next, we fix a disconnecting set of edges FCE and consider all H C ^ such that T{H) = {V, E\F). We 
claim that, among all such collections, the one with maximal probability is Hp '■= {{y,E') : E' C E\F}. 
To show this, we observe that if = (V, E') eH, then E' DF = i/}, thus implying: 

H&n H={V,E'y.E'CE'r\F=$ 

Therefore, the expression in (43) simphfies to: 

max pnp- 
FCE, F disconnects G={V,E) 

We next compute p^^ for given FCE: 

Pn,=f{E{Gt)nF = (D) 

= ¥{{i,j}(^E{Gt), forall{i,i} GF) 
= n i^-Pij)^ 

where the last equality follows by the independence assumption on the link occurrence probabilities. This 
imphes that p^^^^'^' can be computed by 

Pm^''"-= max TT (l-pij) 

^max FCE, F disconnects G=(y,E) , V 

_ g— niinpcE, F disconnects (V,B) 'l2{i,j}^F ~^'^S{^~Pi3) 

— g- minpcE, F disconnects (,v,B) -mincut(V,E,- log(l-P)) 
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Regular graph and uniform link failures. We now consider the special case when the underlying graph 
is a connected regular graph with degree d, d = 2,...,N—1, and the uniform link occurrence probabihties 
Pij = p. It is easy to see that Pmax and I simplify to: 

Pmax = P (node i is isolated) = (1 — p)*^ 
/ = — dlog(l — p). 

V. Application: Optimal power allocation for distributed detection 

We now demonstrate the usefulness of our Theorem 9 by applying it to consensus+innovations dis- 
tributed detection in [24], [23] over networks with symmetric fading links. We summarize the results 
in the current section. We first show that the asymptotic performance (exponential decay rate of the 
error probability) of distributed detection exphcitly depends on the rate of consensus | logpmaxl- Further, 
we note that | logpmaxl is a function of the link fading (failure) probabilities, and, consequently, of the 
sensors' transmission power. We exploit this fact to formulate the optimization problem of minimizing the 
transmission power subject to a lower bound on the guaranteed detection performance; the latter translates 
into the requirement that | logpmaxi exceeds a threshold. We show that the corresponding optimization 
problem is convex. Finally, we illustrate by simulation the significant gains of the optimal transmission 
power allocation over the uniform transmission power allocation. 

A. Consensus+innovations distributed detection 

Detection problem. We now briefly explain the distributed detection problem that we consider. We 
consider a network of N sensors that cooperate to detect an event of interest, i.e., face a binary hypothesis 
test Hi versus Hq. Each sensor i, at each time step t, t = 1,2, performs a measurement Yi{t). We 
assume that the measurements are i.i.d., both in time and across sensors, where under hypothesis Hi, 
Yi{t) has the density function / = 0, 1, for i = 1, . . . , A'" and t = 1,2,. . . 

Consensus+innovations distributed detector. To resolve between the two hypothesis, each sensor i 
maintains over time k its local decision variable Xi^k and compares it with a threshold; if Xi^k > 0> sensor i 
accepts Hi; otherwise, it accepts Hq. Sensor i updates its decision variable Xj ^ by exchanging the decision 
variable locally with its neighbors, by computing the weighted average of its own and the neighbors' 
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variables, and by incorporating its new measurement through a log-hkehhood ratio Lj ^ = log j^^ly'-^j : 

(k — 1 1 \ 

— ^a^j.fe-i + ^%fe 1,^ = 1,2, Xifi = 0. (44) 

Here Oj^fc is the (random) neighborhood of sensor % at time A; (including i), and Wj^^fc is the (random) 
averaging weight that sensor i assigns to sensor j at time k. 

Let Xfe = (xi,fe,X2,fc, ...jXAT.fe)^ and = (Li^^, Liv.fc)^. Also, collect the averaging weights Wij^k 
in the AT X AT matrix Wjfc, where, clearly, Wjj^fe = if the sensors % and j do not communicate at time 
step k. Then, using the definition of $(A;,t) at the beginning of Section III, writing (44) in matrix form, 
and unwinding the recursion, we get: 

k 

= * - 1)^*' ^ = 1' 2, ... (45) 

t=\ 

Equation (45) shows the significance of the matrices $(A;, t) to the distributed detection performance, 
and, in particular, on the significance of how much the matrices ^{k,t) are close to J. Indeed, when 
^{k,t) = J, the contribution of Lt to Xi^^ is [^{k,t)Lt]i = ^ Xli^i ^i-t^ hence sensor i effectively 
uses the local likelihood ratios of all the sensors. In the other extreme, when ^{k,t) = I, [^{k,t)Lt]i = 
Li^t^ and hence sensor i effectively uses only its own hkehhood ratio. In fact, it can be shown that, when 
/ exceeds a certain threshold, then the asymptotic performance (the exponential decay rate of the error 
probability) at each sensor i is optimal, i.e., equal to the exponential decay rate of the best centraUzed 
detector. Specifically, the optimality threshold depends on the sensor observations distributions /i and /o 
and is given by^ (see also Figure 2): 

I>I*{fiJo,N). (46) 

Remark. Reference [24] derives a sufficient condition for the asymptotic optimality in terms of A2(E[VF|]) 
in the form: | log A2(E[Pr^])| > r{fi,fo,N), based on the inequaUty limsupj.^^ | logP($(A;, 0) > 
e) < log A2(]E[VF|]); this inequality holds for arbitrary i.i.d. averaging models and it does not require 
the assumption that the positive entries of Wk are bounded away from zero. The sufficient condition 

I log A2(IE[W|])| > (/i, /o, N) is hence readily improved by replacing the upper bound log A2(IE[1^^]) 
with the exact limit — /, whenever the matrix process satisfies Assumption 1. 

'See [24] for the precise expression of the threshold. 
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B. Optimal transmission power allocation 

Equation (46) says that there is a sufficient rate of consensus /* such that the distributed detector is 
asymptotically optimal; a further increase of / above /* does not improve the exponential decay rate of 
the error probability. Also, as we have shown in subsection IV-B, the rate of consensus / is a function 
of the link occurrence probabilities, which are further dependent on the sensors' transmission power. In 
summary, (46) suggests that there is a sufficient (minimal required) transmission power that achieves 
detection with the optimal exponential decay rate. This discussion motivates us to formulate the optimal 
power allocation problem of minimizing the total transmission power per time k subject to the optimaUty 
condition I > I* . Before presenting the optimization problem, we detail the inter-sensor communication 
model. 




Fig. 2. Lower bound on the exponential decay rate of the maximal error probability across sensors versus the rate of consensus 
/ for Gaussian sensor observations /i ~ Af{m, a^) and /o ~ A/'(0, a^). 



Inter-sensor communication model. We adopt a symmetric Rayleigh fading channel model, a model 
similar to the one proposed in [41] (reference [41] assumes asymmetric channels). At time k, sensor j 
receives from sensor i: 




yij,k 9ij,k\j -^ijk ~t~ '^ii,ki 
% 

where Sij is the transmission power that sensor i uses for transmission to sensor j, gij^k is the chaimel 
fading coefficient, n^j fc is the zero mean additive Gaussian noise with variance a^, dij is the inter- 
sensor distance, and a is the path loss coefficient. We assume that the channels (i, j) and {j, i) at time 

k experience the same fade, i.e., gijj~ = gjiy, gij^k is i-i-d. in time; and gij^t and gim,s are mutually 
independent for all t, s. We adopt the following link failure model. Sensor j successfully decodes the 
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message from sensor i (i.e., the link is online) if the signal to noise ratio exceeds a threshold, i.e., 
if: SNR = > r, or, equivalently, if gfj f, > := The quantity gfjj. is, for the Rayleigh 

fading channel, exponentially distributed with parameter 1. Hence, we arrive at the expression for the 
probability of the link being online: 

Pij = f(^glk>^)=e~'^- (47) 

We constrain the choice of transmission powers by Sij = Sji^, so that the hnk is onhne if and 
only if the link {j, i) is onhne, i.e., the graph realizations are undirected graphs. Hence, the underlying 
communication model is the link failure model, with the hnk occurrence probabilities in (47) that 
are dependent on the transmission powers Sij. 

With this model, the rate of consensus / is given by (40), where the weight Cij associated with link 
(i, j) is: 

Q,(5i,) = -log(l-e-^-^/^-). 
We denote by {Sij} the set of all powers Sij, {i,j} e E. 

Lemma 19 The function I {{Sij}) = mincut(F, C), with Cij = -log(l - e~^«/'^«), for {i,j} G E, 
and Cij = else, is concave. 

Proof: Note that the function I {{Sij}) = mincut(y, E, C) can be expressed as 

min y Cij {Sij). 

E'ca G'=(y,E') is disconnected , 

^ ' {i,j}eE\E' 

On the other hand, Cij{Sij) is concave in Sij for Sij > 0, which can be shown by computing the 
second derivative and noting that it is non-positive. Hence, / {{Sij}) is a pointwise minimum of concave 
functions, and thus it is concave. ■ 
Power allocation problem formulation. We now formulate the optimal power allocation problem as the 
problem of minimizing the total transmission power used at time k, 2 J2^i jy^E '^ij' ^^^^ distributed 
detector achieves asymptotic optimality. This translates into the following optimization problem: 

minimize V r ■ -t ^ p Sa 
subject to I {{Sij}) > I*. 

''We assumed equal noise variances crj; = Var(nij.fc) — Var{nji.k) so that Kij — Kji, which implies the constraint Sij = Sji. 
Our analysis easily extends to unequal noise variances, in which case we would require = ; this is not considered here. 
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The cost function in (48) is linear, and hence convex. Also, the constraint set {{Sij} : I {{Sij}) > I*} = 
{{Sij} : —I {{Sij}) < —I*} is convex, as a sub level set of the convex function — / {{Sij}). (See Lemma 
19.) Hence, we have just proved the following Lemma. 

Lemma 20 The optimization problem (48) is convex. 

Convexity of (48) allows us to find a globally optimal power allocation. The next subsection demonstrates 
by simulation that the optimal power allocation significantly improves the performance of distributed 
detection over the uniform power allocation. 

C. Simulation example 

We first describe the simulation setup. We consider a geometric network with N = 14 sensors. We place 
the sensors uniformly over a unit square, and connect those sensors whose distance dij is less than a radius. 
The total number of (undirected) links is 38. (These 38 links are the failing links, for which we want to 
allocate the transmission powers Sij.) We set the coefficients Kij = 6.25dfj, with a = 2. For the averaging 
weights, we use Metropolis weights, i.e., if link {i,j} is online, we assign Wij^k = l/(l+niax{dj.fc, dj^k}), 
where di^k is the degree of node i at time k and Wij^k = otherwise; also, Wa^k = 1 — J2j&o k ^ihk- 
For the sensors' measurements, we use the Gaussian distribution /i ^ J\f{m, a^), fo ^ M{0, a^), with 
m = 0.0447, and (t^ = L The corresponding value I* = {N - 1)N^ = 0.0455., see [24]. 

To obtain the optimal power allocation, we solve the optimization problem (48) by applying the 
subgradient algorithm with constant stepsize /3 = 0.0001 on the unconstrained exact penalty reformulation 
of (48), see, e.g., [42], which is to minimize jy^^ Sij + fi max {0, — mincut(V, E, C) + /*}, where 
C = [cij], Cij = — log(l — e~^'^^^'^), for {i,j} G E, and zero else; and fi is the penalty parameter that 
we set to iJ. = 500. We used the MATLAB implementation [43] of the min-cut algorithm from [40]. 
Results. Figure 3 plots the detection error probability of the worst sensor maxj=i^... tv Pi{k) versus time 
k for the optimal power allocation {S*j} (sohd blue line), and the uniform power allocation Sij = S 
across all links, such that the total power per k over all hnks 2 Yl{i jy^E ^ij — ^ S{i j}^E ^tj —'■ ^- 
can see that the optimal power allocation scheme significantly outperforms the uniform power allocation. 
For example, to achieve the error probability 0.1, the optimal power allocation scheme requires about 
550 time steps, hence the total consumed power is 5505; in contrast, the uniform power allocation needs 
more than 20005 for the same target error 0.1, i.e., about four times more power In addition. Figure 3 
plots the detection performance for the uniform power allocation with the total power per k equal to 
sr X 5, sr = 2, 3, 3.4. We can see, for example, that the scheme with sr = 3.4 takes about 600 time steps 
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to achieve an error of 0.1, hence requiring about 600 x 3.4 x S = 20405. In summary, for the target 
error of 0.1, our optimal power allocation saves about 75% of the total power over the uniform power 
allocation. 
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Fig. 3. Detection error probability of the worst sensor versus time k for the optimal and uniform power allocations, and 

different values of Sr = t°t=.l power per k for uniform allocation ^ 
total power per k tor optimal allocation 



VI. Conclusion 

In this paper, we found the exact exponential decay rate / of the convergence in probability for products 
of i.i.d. symmetric stochastic matrices W^- We showed that the rate / depends solely on the probabilities 
of the graphs that underly the matrices Wt- In general, calculating the rate / is a combinatorial problem. 
However, we show that, for the two commonly used averaging models, gossip and link failure, the rate 
/ is obtained by solving an instance of the min-cut problem, and is hence easily computable. Further, 
for certain simple structures, we compute the rate / in closed form: for gossip over a spanning tree, 
/ = I log(l — pij)\, where pij is the occurrence probability of the "weakest" link, i.e., the smallest- 
probability link; for both gossip and link failure models over a regular network, the rate I = \ logpisoil, 
where pisoi is the probability that a node is isolated from the rest of the network at a time. Intuitively, our 
results show that the rate / is determined by the most likely way in which the network stays disconnected 
over a long period of time. Finally, we illustrated the usefulness of rate / by finding a globally optimal 
allocation of the sensors' transmission power for consensus+innovations distributed detection. 
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